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Abstract —We derive an exponentially decaying upper-bound 
on the unnormalized amount of information leaked to the wire- 
tapper in Wyner’s wire-tap channel setting. We characterize the 
exponent of the bound as a function of the randomness used 
by the encoder. This exponent matches that of the recent work 
of Hayashi llTI which is, to the best of our knowledge, the 
best exponent that exists in the literature. Our proof (like those 
of fl6l . (17l ) is exclusively based on an i.i.d. random coding 
construction while that of H2ll . in addition, requires the use of 
random universal hash functions. 

I. Introduction 

Wyner JT| introduced the notion of the wire-tap channel 
(Fig. m in 1975: Alice wants to communicate a message 
W £ {1 ,,M} to Bob through a communication channel 
V : X —>• y. Eve also has access to what Alice transmits via a 
wire-tapper's, channel W : X —> Z and the aim of Alice is to 
keep the message hidden from her while maximizing the rate 
of information transmitted to Bob, R = T log M. 



Fig. 1. The Wire-Tap Channel 


To this end, Alice encodes W as a codeword X £ X" and 
sends it via n consecutive uses of the channel. Bob observes 
the output sequence of V, Y £ y n , and estimates W given 
Y. On the other side. Eve has access to Z £ Z n (the output 
sequence of W), and attempts to make an inference about W. 

Wyner (in case when W is degraded with respect to V) fTj 
and later Csiszar and Korner (in a more general context of V 
being more capable than W) 0 showed that, given any input 
distribution Px, Alice can communicate reliably to Bob at any 
rate R up to 

J(X;Y)-J(X;Z), (1) 

(when (X, Y) ~ Px(x)\/(y\x) and (X,Z) ~ Px {x)\N (z\x)) 
while keeping the rate of information leaked to Eve about W 
as small as desired; i.e., guaranteeing 

—I{W ; Z) < e, (2) 

n 

for any e > 0, using sufficiently large n. 

Wyner’s measure of secrecy allows one to investigate the 
trade-off between the message rate and the information leakage 
rate but is too weak from the security point of view; even if 
the amount of information Eve learns about the message W 


normalized to the number of channel uses vanishes asymp¬ 
totically, the amount itself can grow unboundedly as the 
block-length increases. Therefore, it is natural to remove the 
normalization factor in (0 and ask for strong secrecy: 

I{W ; Z) < e. (3) 

Maurer and Wolf showed that the highest achievable rate 0 
under strong secrecy requirement does not change (3|. 

Classical achievability constructions HI. ffl are based on 
associating each message w £ {1 with a sub-code 

of size M' = exp (nR') and transmitting a randomly chosen 
codeword from that sub-code to communicate w. The relia¬ 
bility of the code is ensured by keeping the total rate R 1 + R 
below J(X; Y). Furthermore, by varying the rate IV from 0 
to /(X; Z), the upper-bound on the information leakage rate, 
— /(W : Z), is controlled. Particularly, by choosing the rate R' 
just below I(X;Z ), weak secrecy is established. 

An alternative way to approach the secrecy problem is to 
establish secrecy through channel resolvability 13-0. Given 
an input distribution Px that induces the distribution Pz at 
the output of a channel W : X —>• Z, a code of rate J(X; Z) 
or larger chosen from the i.i.d. Px random coding ensemble 
will, with high probability, induce an output distribution that 
approximates P% when the index of the transmitted codeword 
is chosen uniformly at random. E), 13-1111. 

For any fixed message w £ {1,..., M} the output of Eve’s 
channel has distribution Pz\w=w It is n °t difficult to see that 
the secrecy is guaranteed if Pz\w=w ‘well approximates’ the 
product distribution P% by setting the sub-codes’ rate IV just 
above I(X\Z). In particular, if we measure the quality of 
approximation by asking the unnormalized Kullback-Leibler 
divergence between Pz\w=w an d P'z to be small, strong se¬ 
crecy will be established. Indeed, in El, 0 it has been shown 
that the information leakage, I(W;Z) will be exponentially 
small in n provided that R' is above /(X; Z). 

Definition 1. Given R, IV and W, a number K is a secrecy ex¬ 
ponent for the wire-tapper channel W, if there exist a sequence 
of reliable coding schemes of rate R, requiring the entropy rate 
R' at the encoder, for which liminf — — log[/(VL; Z)] > E. 

n—too n 

In El, 0 the secrecy exponent is derived using i.i.d. 
random coding ensemble. More specifically, each message 
w £ {1 ,M} is associated with a sub-code whose code¬ 
words are independently (and independent of the codewords 
of the other sub-codes) sampled from the i.i.d. random coding 
ensemble. The exponent is derived by upper-bounding the 

















ensemble-expectation of D{P z ^ w \\P^\Pw) and then conclud¬ 
ing that there exists a sequence of codes in the ensemble 
using which the information leakage decays at least as fast 
as Wj[D(P z i w \\Pz\Pw)] does. The secrecy exponent of Hou 
and Kramer in 0 is derived based on their resolvability proof 
of 0 Section III-A] which is simple but results in a small 
exponent. However, by applying the method described in I® 
Section III-B] to the wire-tap channel setting a larger exponent 
can be obtained which is equal to that of Hayashi in 0. 

In d, Hayashi uses privacy amplification to improve 
the secrecy exponent based on a different construction than 
those of @-0. In addition to a code of size MAP, whose 
codewords are sampled independently from the i.i.d. random 
coding ensemble, a hash function is sampled from the en¬ 
semble of universal hash functions from {1 to 

{1,..., M} and revealed to Alice, Bob, and Eve. A message 
to £ {1 is communicated by sending a randomly 

chosen codeword from the code and, then, mapping the index 
of the sent codeword, using the hash function, to an element 
of {1 ,...,M}. The expected information leakage (where 
the expectation is taken over both i.i.d. random coding and 
universal hash functions ensembles) is then upper-bounded to 
show that the exponent of the bound is a secrecy exponent. 

In this paper, we derive an exponentially decaying upper- 
bound on E[D(P Z \ W ~ W ||Pjj)], where the expectation is taken 
over the i.i.d. random coding ensemble (i.e., the construction 
used in 0-0), by analyzing the deviations of Pz\w=w 
from its mean. It then follows (by standard expurgation 
arguments) that for Ve > 0, there exist a code of essentially 
the same rate R, using which max w D(P z \ w = w \\Pz) < 
(1 + e)E[D(P z ^ w=w \\Pz)}- As already noted in 0, this is a 
worst-case measure of secrecy in contrast to I{W;Z) which 
is an average-case measure of secrecy. In addition, this shows 
that our lower-bound on lim„_ ) . 00 — ^\ogE[D{P z \ w=w \\P^)] 
is a secrecy exponent. This exponent matches that of 02 
which is larger than those of 0-0. 

II. Notation 

We use uppercase letters (like X) to denote a random vari¬ 
able and corresponding lowercase version ( x ) for a realization 
of that random variable. The boldface letters denote sequences 
of length n. The i-th element of a sequence x is denoted as 
Xi. We denote finite sets by script-style uppercase letters like 
S. The cardinality of set S is denoted by |<S|. For a positive 
integer to, [to] = {1,2,...,to}. R denotes the set of real 
numbers and R = 111 {—oo, +oo} is the set of extended 
real numbers. We write /(n) = g(n) (resp. f(n) < g{n)) if 
lim„^oo ± log ^ = 0 (resp. < 0). 

We denote the set of distributions on alphabet X as V(X). 
If P £ V(X), P" £ V{X") denotes the product distribution 
P"(x) = YlUnxi)- Likewise, if V : X —> y is a con¬ 
ditional distribution V" : X n —> y n denotes the conditional 
distribution V”(y|x) = 117=1 v (j/i|®i)- 

We denote the type of a sequence x £ X n by P x £ V{X) 
and the conditional type of y £ y n given x £ X n by V y | x : 
X —» y (see fl3l Chapter 2] for formal definitions). 


A distribution P £ V{X) is an n-type if nP{x) £ N>o 
for Va; £ X. We denote the set of n- types on X as 
V n {X) C V{X) and use the fact that [P n {X)\ = 0{n) x \) 
Ifl3l Lemma 2.2] repeatedly. 

If P £ V n (X), we denote the set of all sequences of type 
P as Tp C X n . If V : X —> y is a conditional distribution, 
the V -shell of x £ X 11 , is denoted as 7^,(x) C y n . 

III. Result 

In the rest of the paper (X, Z) £ X x Z denotes the pair 
of random variables whose joint distribution is P x ,z{x, z) = 
P x {x)\N(z\x) where Px is a fixed input distribution. For 
simplicity (and with no essential loss of generality) we assume 
the supp(Px) = X and supp(Pz) = Zp] 

Following a we consider the following random code 
construction: for every message w £ [M], a codebook of size 
AP = exp {nR'), denoted by C w , is constructed by sampling 
AP codewords, X, WtW >,w' £ \M' ] independently from the 
product distribution Pf. In order to communicate the message 
w, Alice picks w' £ \AP ] uniformly at random and transmits 
X, l; Given such a construction, for every w £ [M] and 
z £ Z n , the conditional output distribution of W is 

M' 

Pz\w(*H = -^E W "(z|X^)> (4) 

w '=1 

which is an average of i.i.d. random variables and 

E[P zlw (z\w)\ = Pg( z), Vw £ [M], (5) 

Theorem 1. Using the aforementioned construction, for Vw; £ 

[M], 

E[D(P ziw=w \\PS)] < exp[—nE s (P x ,\N,R')]. 

with 


E s {P x , W, R') = max { A R' - F 0 (P x , W, A)}, (6) 

where 


Fo(Px,W, A) = log 


^ p z{z) Y] P x \z{x\z) 


1+A 




x^X 


Px{x ) 


-A 


Remark. Fo{P X -fN,X) is a convex function of A (cf. Ap¬ 
pendix lE^Bji passing through the origin with the slope 


—F 0 {P x ,\N,X) 


A=0 


I{X-Z). 


Hence E S {P X ,\N,R') > 0 with equality iff R' < I{X ; Z). 

The only random quantity involved in the divergence 
P{Pz\w=w\\Pz) i s ^e conditional distribution Pz\w=w 
whose expectation is Pf as shown in 0. To prove TheoremQ] 
we shall analyze the deviations of the random variables 
Pz\w ( z \ w ) from their mean, PJ)(z). 

As an immediate corollary to Theorem [T] we have: 


1 The second assumption follows from the first together with the assumption 
that for Wz E Z there exist at least one x such that W(^|:r) > 0. 







Corollary 2. For any input distribution Px and a pair of rates 
R and R', there exists a reliable code of rate R using which, 
for any message distribution Pw, 

Pe < exp [-nE r (P x ,y, R + P')], 

I(W:Z) < exp [-nE s (P x ,\N,R')], 

where P e denotes the decoding error probability of Bob and 
E r is Gallager’s random coding exponent Chapter 5], 
Hence, for (R, R') such that R + R' < I(X-,Y), the E s in 
Theorem [7] is a secrecy exponent. 

Corollary [2] is proved in Appendix [B] 

IV. Proof of TheoremQ] 

For \/w € [M] and Vz £ Z n let 

Pz\w{A w ) 


U n (z\w) = 


PJJ(z) 


(7) 


Using (O, it is easy to see that E[C/ n (z|-u;)] = 1. 
Using the linearity of expectation, we have: 

E[£(P Z | W =J^)] 

\ ' t | (Pz\W( z \ w ) 

= 2 ^ E l P z\w(z\w)log[ 


zG-Z” 


P2(*) 


= E Pz(z)^[Un(z\w)\og(U n (z\w))] 

z GZ^ n 

= E E P%(z)¥.[U n (z\w)\og(U n (z\w))]. (8) 
Pev n (z) 2 ^T P 

To prove Theorem Q] we shall use the following result. 
Lemma 3. For P £ V{Z), let 
Go(Px,z, P, A) 


Y / P(^^g[Y / Px\z(x\z) 1+x P x (x)- x 


z£LZ 


xGX 


(9) 


and 


Proof of Lemma \3} Pick any P £ V n (Z) and observe 
that for z £ Tp, 

—pm^j = exp[n(P(V x|z ||Px|P) - P(V x|z ||P*| Z |P))]. 

For every P £ V(Z) and stochastic matrix Q : Z —> X define 
A x ,z{P\ Q) = D(Q||P X |P) - D(Q\\P X]Z \P). (13) 

Thus, using ©, 

1 M ' 

U n (z\w) = — ^2 exp\nA x ,z(P-Xx WtW ,\ z )] (14) 

w'=l 

Let 

A = {Ax,z(P; Q) for all conditional types Q} C M, (15) 

and observe that |.4| = 0 (n^ x ^ 3 \f Set A = {a £ A : a > 
—oo} and for each a £ A define 

Ta( z)4 (J Tq( z)cx n , (16) 

Q.Ax,z(P,Q)=a 


where 'Tq(z) is the Q-shell of z and the union is over 
conditional types Q : Z -A X (thus contains 0(n) x ^ z \) 
shells). Now we can rewrite (ITdl ) a^j 

U n = U n (z\w) = jf, E Na ex P( na )> ( 1? ) 

aeA 


with N a = {'w' : X u , . u ,/ £ T a (z)}\ denotes the number of 

codewords of C w in 'f, (z). Since the codewords are indepen¬ 
dent, N a is a Binomial(M',p Q ) random variable where. 


Pa = PZ(T a ( z)) = Y, P A'(r 0 (z)) 

Q:Ax,z(P‘,Q)=a 


= exp 


—n min £>(Q||P\-|P) 

Q :Ax,z(P‘,Q)=a 


(18) 


E t (Px,z, Rf, P) = max {AP/ - G 0 (Pa',z, P, A)}. (10) 

Then, for every w £ [M], 

E[U n (z\w)\og(U n (z\w))] 

<exp[-nP t (Px,z,P',Pz)]. (11) 

Having proved Lemma [3] Theorem |T] follows by using ( 1 1 I t i 
in ([8} and (13] Lemma 2.6] to conclude 

E[D(P zlw=w \\P2)\ < exp [—nE s (P x , W, P')], 

where 

E S (P X ,\N,R') 

= p mm z) {D(P\\P z ) + E t (P X: z,R',P)}. (12) 

Using m the equivalence of (IT2l > and (]6]» is shown in 
Appendix [D] This completes the proof of Theorem Q] ■ 


In the above, the second equality follows since Q-shells are 
disjoint, the third equality follows from jL3] Lemma 2.6] (a 
similar approach is used in 03 to express a quantity of interest 
as a weighted sum of Binomial random variables). 

In Appendix IC-AI we compute the value of 

Pb(Px,z,P,«)= . min P(Q||Px|P) (19) 

Q:Ax,z(P',Q)=a 

and, in particular, show that 

Eb(P X ,z,P,a) > a, (20) 

with equality iff a = D(P X \ Z \\P X \P). 

Partition A = Ai U A 2 as 

Ai = {a£A:a< P'}, A 2 = {a £ A : a > R'}, 

2 Since z and w are assumed to be fixed throughout the proof, we drop 
them from the argument of U n for the sake of brevity. 












We now have. 



K[S n ] = 22 Pa exp (na) 

a£Ai 

= exp -n mm{E b (P x z ,P,a) - a} , 
L aeAi J 


( 27 ) 


where the last equality follows since \Ai\ = 0(n^ x ^ z ^). 
Furthermore, 


Fig. 2. The function i/>(s) defined in 1241 and the upper-bound in (25}. In 
the figure Sn = E[5 n ]. 

and split ( fTTb as 

Un = jp ^2 Na ex p ( na )+jfj 22 Na ex p( na ) ■ 

a€.Ai a£A 2 

'-—V-' '-v-' 

=s„ =T„ 

For non-negative s and t and u = s + t we have 

u ln(w) = s ln(u) + t ln(u) 

= s ln(s) + s ln(l + t/s) + t ln( u) 

< s ln(s) + t (1 + ln(u)) 

where the inequality follows since ln(l + t/s) < t/s. Hence, 

E[(7„ log(f7„)] = E[f7 n ln(C/„)] 

< E [S n ln(S„)] + E[T„(1 + \n(U n ))]. (21) 

Moreover, since U n < l/P]J(z), we have 

In (U n ) < ln(l/P£(z)) < n\n(l/p 0 ) 
where po = min z g 2 ; P z {z) > 0. Thus, from (ITT} we have 


var(SVi) = —2 22 ex PK° + «')] co v(N a , N a ') 

(a,a')eAf 


(a) 1 

< 


M> 


22 exp [n(a + a')] V var(iV a ) y 7 vai(N a >) 


M' 

(b) i 


(a,a')G^lf 

771 ( 22 ex P [ na ] \/ var (N a ) 




( max | exp [na] sj var (N a ) | 


M' V 06 - 41 

1 


max!-- exp[2?ra] var(7V a )l 

a&Ai L M' > 


^s{jp eM2na]Pa } 

= exp [—n min { R' + Eb(P\ z, P, a) — 2a| 
L aeAl ’ 


(28) 


In the above, 

(a) follows by Cauchy-Schwarz inequality, 

(b) follows since |„4i| = 0{v) A H" 2 !), 

(c) follows since var (N a ) = M'p a ( 1 — p a ) < M'p a , 

and finally (128} follows from (IT8l> and ( | 1 0| i. 

Similar to (l27l >. 


E[U n HU n )] < E[S n ln(5„)] + (nln(l/p 0 ) + 1)E[T„] 

= E[S„ln(S n )]+E[T n ]. (22) 


We now upper-bound each of the above expectations to 
complete the proof. 

First we note that for any constant c G R, 

E[S„ ln(5„)] = E [S n In(5„) + c(S n - E[5„])]. (23) 

In particular, 

E [S n ln(5„)] = E[4,(S n )} 

where 


ip(s) = aln(a) - (ln(E[S„])+l)(a - E[S n }). (24) 


One can check that (see Fig. |2} 

~ {S E^ + E[ ^" ] ln ^ S ^ ^ 


(s — E[SVi]) 2 


E[5„] 


(25) 


where the last inequality follows since E[5 n ] = 1 — E[T n ] < 1 
as S n and T n are both non-negative random variables. 

Using (125} in (123} we conclude that 


E[S„ln(S„)] < 


var(S„) 

E [S n \ ' 


E[T„] = exp 


-n min {E b (P xz ,P, a) 
a£A 2 



(29) 


Putting (ITT} and (128} in (126} together with ( 129} in (122} we 
conclude that 


E t (P x ,z,R',P) =mm{E 1 (P x ,z,R , ,P) ~ E 2 (Px,z,R',P), 
E 2 (P x ,z,R',P )}, (30) 

where 

E 1 (P Xt z,R',P) = mm{R' + E b (P X}Z ,P,a)-2a}, (31) 

a<R' 

Ei ( P x ,z , R', P) = min { E h (P X .z,P,a)-a}, (32) 

iv 

E 2 {Px,z,R\ P) = mi n{E b (P XjZ ,P, a) - a}. (33) 

We now observe that: 

i. lower-bounding R' by a in d3l} shows E 1 ( P x ,z , R'. / ) 
E 2 (P x ,z,R',P) > 0. 

ii. by (20} . one and only one of E 2 (P x ,z, R', P) or 
E 2 (P x ,z,R',P) is zero. 

Thus ( [30} simplifies to 

E t (P x , z , R 1 , P) = min{B 1 (P JCiZ) Rf, P),E 2 (P XtZ ,Rf, P)} 

(34) 


(26) 



























In Appendix IC-BI we show that 

E x {P XtZ ,R!,P) =max{\R' -G 0 (Px,z,P,\)}, (35a) 
E 2 {P x ,z,R',P) =max{XR' -Go(Px,z,P,X)}- (35b) 

Using the above in (|34| > concludes the proof. ■ 

V. Discussion 

We derived a lower-bound on the secrecy exponent of the 
wire-tap channel using i.i.d. random codes. Comparing © 
with fl2l Equation (12)], we see that our exponent is equal 
to that of fl2) which is the best lower-bound on the secrecy 
exponent among those reported in ©, J7], |fl2l . However, our 
proof is based on a pure i.i.d. random coding construction and 
does not require the ensemble of universal hash functions as an 
additional tool. While this manuscript was in review, it came 
to our attention that in lfl6l , [ 17) also alternative derivations of 
the same lower-bound are given based on pure i.i.d. random 
coding constructions. 

Our proof is a generalization of that of J8) Section III-A]; 
instead of partitioning the set of output sequences Z n into 
two classes of typical and atypical sequences, we partition 
it into ()(ri) z ^) type-classes to upper-bound the expected 
unnormalized Kullback-Leibler divergence between the output 
distribution and the desired product distribution P x . In addi¬ 
tion, in Femma© we bound the point-wise difference between 
those distributions at each z £ Z n . 

Furthermore, we believe that the method described here has 
merit in showing the doubly exponential nature of the concen¬ 
tration of the output distribution; as we see in ©, the output 
distribution Px\w{' L \ w ) * s an average of M' i.i.d. random 
variables. If the distribution of the summands was independent 
of M', the average would have concentrated around its mean 
exponentially fast in M', that is doubly exponentially fast in n. 
Although this is not the case, we see in the proof of Lemma[3] 
that among poly normally many summands in (fT71 >. only the 
one corresponding to a = D(P x \ z \\P x \P z ) has a significant 
contribution to the mean of U n (z\w) (which is a normalized 
version of Pz\w( z \ w )); the rest all have exponentially small 
means. Applying the Chernoff bound to this particular term, 
we see that if R' > D(P X \ Z \\P X \P Z ) the dominant term 
concentrates around its mean doubly exponentially fast in n. 
In particular, there exists a class of wire-tapper channels for 
which U n (z\w) consists only of this dominant term@ 

The achievability constructions of ©-JU, fl2l . fl6l . lfl7l 
are based on i.i.d. random codes. It is an open question 
whether random constant-composition codes lfl3l will lead 
to a better secrecy exponent. We believe that our method is 
easily adaptable to other types of random coding (some ideas 
presented in (18) can also be useful in this direction). Another 
important subject in the context of wire-tap channel is to derive 
non-trivial upper-bounds on the secrecy exponent. 

The performance of a wire-tap code is measured via two 
quantities, the error probability and the information leakage, 

3 This happens if for Vz £ Z , for every x £ X either W (z | a;) = 0 or 
\N(z\x) = e z for some constant e z < 1 independent of x. 


which are both shown to be exponentially decaying as a 
function of the block-length n. The trade-off between these 
exponents has been recently studied in fl9l . 

We conclude our discussion by remarking that, as shown in 
El, for general channels V and W, any message rate up to 

I(V;Y)-I(V-,Z), 

where V—°—X—°— (Y, Z) form a Markov chain, is achievable. 
Our results (and also those of others cited) are straightfor¬ 
wardly extensible to the case when the channels are prefixed 
with a channel P x \v and auxiliary random variable V is used. 
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Appendix A 
Proof of © 

The right-hand-side of 0 is the average of identically 
distributed random variables. The mean of each of them is 


E 


H W ( Zi \Xi) = n E^p x [W (Zi\x)] 

%— 1 “I 2=1 

n n 

= n [ E p x{x)\N( Zi \x) = X\P Z { Zi ) 


2—1 XdX 


Therefore, there exists a code C* = U u ,g njvij^w hr the 
ensemble using which we simultaneously havqj: 

2 M 

— J2 Pr [^ 4 w\W = w] 

w =1 

<exp [-nP r (P*,V,P + P')]. (39) 

2 M 

^E5 (P z \ w=w \\ p z) 

W= 1 

<exp [-nE s (P x ,\N,R% (40) 

Since each of the summands in ( l39l > is positive, there exist 
a subset Wi C (1,..., 2 M} of cardinality |Wi| > |M such 
that, for \/w £ Wi, 

Pr [W / w\W = w]< 4exp [-nE I (P x ,V,R +R')]. (41) 

Similarly since the summands in ( l40l ) are positive, there exists 
a subset W 2 C {1,..., 2 M} of cardinality | W 2 1 > §A/ such 
that, for Vu> £ W 2 , 

D(Pz\w= w \\ p z) < 4exp[—nE s (P x , W, R')}. (42) 


In the above, the first equality follows since the codewords are 
sampled from the product distribution P£. ■ 


Appendix B 

Proof of Corollary [2] 

Let M = exp (nR) and construct 2 M i.i.d. codebooks 
of size M' = exp {nR'), C w ,w £ [2 M\ by sampling each 
codeword independently from the product distribution P£ ■ As 
we already described, in order to communicate w £ [2M], 
Alice picks w' £ \M' ] uniformly at random and transmits 
X WjW / over the channel. The union of this codebooks C = 
Umi£[ 2 A/] is a random i.i.d. codebook of rate R'+R+ log f’ . 
Hence, using this ensemble for communicating over V, for 
each w £ |2M], the expected decoding error probability is 
upper-bounded as 

E[Pr[IV WjW = u>]] 

< E[Pr[{IV ^ W} U {W' ± W'}\W = w] 

< exp[— roP r (Px, V, R + R' + o(l))], (36) 

due to IH Theorem 5.6.2], In the above, W and W denote, 
respectively, the maximum likelihood estimations of W and 
W' given Y, the output sequence of V. Consequently, 


E 


■ 1 
.2 M 


2 M 

57 P r [IV f w\W = w;] 


W=1 

exp [-nE I (P x ,\/,R + R')}- 


(37) 


Likewise, Theorem Q] implies 

2 M 

'L2 M 

< exp[-nE 8 (P x ,\N,R')]. 


e ^E D ^\ w =™w p z 

W = 1 


(38) 


Since |Wi Cl W 2 I > M there exist a subset W C Wi Cl 
W 2 of cardinality |W| = M. The sub-code defined by the 
messages in W, Uujgw has rate R and, using that, for any 
message distribution f\\z on W, we have: 

P e = 5^ p w{w) Pr[W" 7 ^ w\W = w] 

< exp[— nE T (P x , V, R + P')], 

due to d4lt . and 

/(IV; Z) = D(P zlw \\Pg\Pw) - D(P z \\Pg) 

< E p w(w)D(P z \ w =w\\Pz) 

wGW 

< exp [~nE s (R\ P x , W)], 

due to (l42l i. ■ 

Appendix C 

Derivation of Exponents for The Proof of Lemma [3] 

A. Derivation of E b and It’s Properties 

Proposition 4. Let E b (Px,z, P, a) be defined as in ( fl9l ). Then, 

E b (Px,z, p ,a ) = a + max{pa - G 0 (P x ,z, p , p)}, (43) 

pe R 

where Go is defined in ([9]). 

4 Markov inequality implies for at least |- of the codes in the ensemble, 

1 2 M 2M 

—— ^2 Pr [W / w\W = w] < 3E fiE Pi[W f w\W = w]l. 

10 = 1 10 = 1 

Similarly for at least ^ of the codes in the ensemble, 

1 2 M 1 2 M 

2M E D ( p z\w=^\\PS) <se[— J2 D{Pz\w= w \\PS)]- 
10=1 10 = 1 

Therefore, for at least ^ of the codes in the ensemble both < 1391 and < 1401 hold 
simultaneously. 










Proof: Let 


, \ A i ( Px,z(x,z) \ 

‘^ (l - Z ) = l°g UxWP zW J’ 


V(x, z) £ X x Z, 


denote the information density function for the joint distribu¬ 
tion P xz for the sake of brevity. 

Using m , 

min P(Q||Px|P) 

Q:Ax,z(P,Q)=a 

= 0+. min D(Q\\P X{Z \P) (44) 

Q-Ax,z(P',Q)=a, 

Now, we have 

min D(Q\\P x \z\P) 

Q'-Ax,z(P;Q)=a 


= mm 
Q 


= mm 

Q pGR 

(*) 


> 




z$LZ 


\Hxax p x\z(x\z) exp [pi x ,z{x, z)\ 


where (*) follows since E b (P X}Z ,P,a) is convex in a. Using 
(l43l) we have 

min {E b (P XiZ , P, a) - (1 + A)a} 


aGK 


iin {d(Q\\P x]z \P) + max p(a - A XtZ (P; $)) | 

nnmax{D(Q||P X | Z |P) + p(a - A X<Z (P\ Q))} 

max(min{.D(Q||P.Y|z|P) - pA xz (P; Q)} + pa\ 
pgr f q > 

where (*) follows since -D(Q||Px|z|P) I s a convex function 
of Q and A XZ (P; Q) is a linear function of Q. Therefore, 
D{Q\\P x{z \P) ~ pA x . Z (P- Q) is also a convex function of 
Q and we can swap the min and the max. Now, 

D{Q\\P X \ Z \P) — pA x>z (P; Q) 

= e p w E q(*i«) iog( p . t Q(a t } , 

xtx ^Px\z(x\z)exp[pL X , z (x,z)\ 


with equality iff Q(xjx) tx P x \ z (x\z) ex.p\pi X ^ z {x, z)\ (using 
the concavity of logarithm). Therefore, 

min{D(Q||Px|z|P) - pA x , z (P; Q)} + pa = pa 
Q 

~ E P ( 2 ) 1 o s(E Px\z{x\z) exp[ P L X , z (x, z)}). ■ 

z$LZ x$LX 

Remark. It is easy to verify that E b (P XtZ , P,a) is a convex 
function of a. Furthermore. ( 144b implies E b (P XiZ ,P,a) > a 
with equality at a = D(P X \ Z \\P X \P). 

B. Derivation of E\ and E 2 

Proof of (135al) : Using (OH . 

Ei(P x , z , R', P) = min {P' + E b (P x , z , P, a) - 2a} 

a<R' 

= min/p' + E b (Px z , P, a) — 2a + maxA(P' — a)\ 
aGR f ’ A<0 J 

= min maxi (1 + A)P' + E b (P x z , P, a) — (2 + A)al 
aGR A<0 L ’ J 

= min max (A R! + E b (P x z , P, a) — (1 + A)al 
aGR A<1 l ’ J 

= max/AP' + min {E b (P x z ,P,a ) - (1 + A)a}|, (45) 

A<1 L aGR j 


mini max{ pa - G 0 (P X z ,P,p)\ - Aal 

aGK ^ pGK * 

= ruin max{ (p - A )a - G 0 (P X Z ,P, p)} 

aGK pGK 

= max { min {(p - A)a} - G 0 (P x ,z,P,p) ), (46) 

pGK v aG K J 

where again (*) follows since Go(P X ,z, P, p) is convex in p 
(cf. Appendix lE-Ab . We then note that the minimum of the 
linear term (p— X)a over the choices of a is — oo unless p = A. 
Therefore, the result of d46l) is 

min{P 6 (Px,z,P,a) - (1 + A)a} = -G 0 (P X ,z,P, A) (47) 

aGR 

Plugging the above into (l45l > completes the proof. ■ 

Proof of (135bl) : Similarly, using ( 1321 . 

E 2 {P X ,z , P', P) = min {E b {P x , z , P, a) - a} 

= mini E b (P x z , P, a) — a + max A(P' — a )} 
aGR f ’ A>0 J 

= min max { A R' + E b (P x z, P, a) — (1 + A)a| 

aGR A>0 L ’ J 

— maxjAP' + mm{E b (P x z , P, a) — (1 + A)a|(48) 

A>0 L aGR J 

where (*) follows since E b (P XtZ , P,a) is convex in a. Using 
( l47l > in ( 1481 completes the proof. ■ 

Appendix D 
Derivation of E s 

Plugging ([Tol l into (IT2l) we have 
min {E t (P xxz , P, R') + £>(P||P Z )} 

PGV(Z)^ J 

= min { max {AP 7 - G 0 (Py,z,P, A)} + D(P||P Z )} 

PGP(Zy 0<A<1 J 

( = max{AP'+ min {D(P||P Z ) - G 0 (P X ,z, P, A)}} 

where (*) follows since Go, defined in © is a linear function 
of P while D(P\\P Z ) is convex in P and we can swap the 
min and the max. The claim follows then by observing that 

D{P\\P Z ) — G 0 (P XtZ , P, X) 


= E p (*) 




log 


P(z) 


Pz(z: 

log(j2 Px\z(x\z) 1+x P x (x\z)- x ) 


IGX 


> log 


E Pz ^ E ( p x\z(x\z) 1+x P x {x) 


-A 


z£LZ 


xex 













with equality if 

P{z) oc P z (z) Y (Px\z(x\z) 1+x P x (x)~ x y 

x£X 

using the concavity of logarithm. ■ 


Appendix E 
Convexity Proofs 

Lemma 5. Let at > 0, and bi > 0, i = 1,..., k be arbitrary 
real numbers. Then the function 

k 

/0) = log (J2 aib i)’ 

i=l 

is convex in s for Vs € M. 

Proof: Pick si < s 2 and t 6 (0,1). Let t = 1 — t and 
s = tsi + ts 2 . Then, Holder’s inequality implies 


Y ttlb i = J2( a i b i 


x o. l b i 


< 


(j2 a * b ?) ■ 


2=1 2=1 2=1 2=1 

Taking the log of both sides of the above concludes the proof. 


Lemma 6. Suppose fi(s),i = 1,2,... ,k are convex functions 
in s and ai > 0, i = 1, 2,..., k is a sequence of real numbers. 
Then, 

(i) /(s) = 5Zi=i a,ifi{s ) is convex in s. 


(ii) g(s) = log^i=i a i exp[/,(s)]J is convex in s. 

Proof: The convexity of /(s) is trivial. To prove the 
convexity of g(s), let Si < S 2 and s = ts± + fs 2 for some 
t G (0,1) (where t = 1 — f). Then 

k k 

Y^, a i exp[/i(s)] < Yl ai ex p[f/i( s i) + (1 -t)fi(s 2 )] 

2=1 2=1 

k 

= Y{ a i eX X a i ex P[^/i( S 2)]) 

2=1 

k t k f 

< (E».«p[/.(«)]) (E exp[/j(s 2 )]^ 

i=l i=1 

where the second inequality follows by Holder’s inequality. 
Taking the logarithm of both sides of the above proves (ii). ■ 
Convexity of the functions Fq and Go is established using 
the above two lemmas as follows: 

A. Convexity of Go 

Set at = Px\z{x\z) and bi = P p.^]~' > in Lemma [3 and 
then use Lemma |6] part (i). ■ 

B. Convexity of Fq 

Set ai = Px\z( x \z) and bi = in Lemma [5] and 

then use Lemma [6] part (ii). ■ 




